Developing phoneme‐based lip‐reading sentences system for silent speech recognition

نویسندگان

چکیده

Lip-reading is a process of interpreting speech by visually analysing lip movements. Recent research in this area has shifted from simple word recognition to lip-reading sentences the wild. This paper attempts use phonemes as classification schema for explore an alternative and enhance system performance. Different schemas have been investigated, including character-based visemes-based schemas. The visual front-end model consists Spatial-Temporal (3D) convolution followed 2D ResNet. Transformers utilise multi-headed attention phoneme models. For language model, Recurrent Neural Network used. performance proposed testified with BBC Lip Reading Sentences 2 (LRS2) benchmark dataset. Compared state-of-the-art approaches sentences, demonstrated improved 10% lower error rate on average under varying illumination ratios.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speech Recognition System For Spoken Japanese Sentences

A speech recognition system for continuously spoken Japanese simple sentences is described. The acoustic analyser based on a psychological assumption for phoneme identification can represent the speech sound by a phoneme string in an expanded sense which contains acoustic features such as buzz and silence as well as ordinary phonemes. Each item of the word dictionary is written in Roman letters...

متن کامل

The MUTE silent speech recognition system

sEMG based silent speech recognition has become a desirable communication modality because it has the potential to provide natural, covert, hands-free communication in acoustically challenging environments. To enable this capability, we have developed a portable, self-contained, Android based Mouthed-speech Understanding and Transcription Engine (MUTE) system. To demonstrate the MUTE system’s a...

متن کامل

Towards a practical silent speech recognition system

Our recent efforts towards developing a practical surface electromyography (sEMG) based silent speech recognition interface have resulted in significant advances in the hardware, software and algorithmic components of the system. In this paper, we report our algorithmic progress, specifically: sEMG feature extraction parameter optimization, advances in sEMG acoustic modeling, and sEMG sensor se...

متن کامل

Auxiliary Multimodal LSTM for Audio-visual Speech Recognition and Lipreading

The Aduio-visual Speech Recognition (AVSR) which employs both the video and audio information to do Automatic Speech Recognition (ASR) is one of the application of multimodal leaning making ASR system more robust and accuracy. The traditional models usually treated AVSR as inference or projection but strict prior limits its ability. As the revival of deep learning, Deep Neural Networks (DNN) be...

متن کامل

Multi-pose lipreading and audio-visual speech recognition

In this article, we study the adaptation of visual and audio-visual speech recognition systems to non-ideal visual conditions. We focus on overcoming the effects of a changing pose of the speaker, a problem encountered in natural situations where the speaker moves freely and does not keep a frontal pose with relation to the camera. To handle these situations, we introduce a pose normalization b...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: CAAI Transactions on Intelligence Technology

سال: 2022

ISSN: ['2468-2322', '2468-6557']

DOI: https://doi.org/10.1049/cit2.12131